345 research outputs found
A Mouth Full of Words: Visually Consistent Acoustic Redubbing
This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [1]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, one-to-many, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling
The Effect of Speaking Rate on Audio and Visual Speech
The speed that an utterance is spoken affects both the duration of the speech and the position of the articulators. Consequently, the sounds that are produced are modified, as are the position and appearance of the lips, teeth, tongue and other visible articulators. We describe an experiment designed to measure the effect of variable speaking rate on audio and visual speech by comparing sequences of phonemes and dynamic visemes appearing in the same sentences spoken at different speeds. We find that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech
Hand Keypoint Detection in Single Images using Multiview Bootstrapping
We present an approach that uses a multi-camera system to train fine-grained
detectors for keypoints that are prone to occlusion, such as the joints of a
hand. We call this procedure multiview bootstrapping: first, an initial
keypoint detector is used to produce noisy labels in multiple views of the
hand. The noisy detections are then triangulated in 3D using multiview geometry
or marked as outliers. Finally, the reprojected triangulations are used as new
labeled training data to improve the detector. We repeat this process,
generating more labeled data in each iteration. We derive a result analytically
relating the minimum number of views to achieve target true and false positive
rates for a given detector. The method is used to train a hand keypoint
detector for single images. The resulting keypoint detector runs in realtime on
RGB images and has accuracy comparable to methods that use depth sensors. The
single view detector, triangulated over multiple views, enables 3D markerless
hand motion capture with complex object interactions.Comment: CVPR 201
Predicting Head Pose from Speech with a Conditional Variational Autoencoder
Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally co-occurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Several previous authors have shown that prediction is possible, but experiments are typically confined to rigidly produced dialogue. Natural, expressive, emotive and prosodic speech exhibit motion patterns that are far more difficult to predict with considerable variation in expected head pose. Recently, Long Short Term Memory (LSTM) networks have become an important tool for modelling speech and natural language tasks. We employ Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language, to model the relationship that speech has with rigid head motion. We then extend our model by conditioning with prior motion. Finally, we introduce a generative head motion model, conditioned on audio features using a Conditional Variational Autoencoder (CVAE). Each approach mitigates the problems of the one to many mapping that a speech to head pose model must accommodat
Audio-to-Visual Speech Conversion using Deep Neural Networks
We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results
ADEnosine testing to determine the need for Pacing Therapy with the additional use of an Implantable Loop Recorder (ADEPT-ILR)
MD ThesisAim: To determine the efficacy of permanent pacing in preventing syncopal
episodes in patients with unexplained syncope and a positive adenosine
test via a randomised double-blind placebo-controlled crossover trial
with an accompanying negative adenosine test implantable loop recorder
arm.
Methods: Individuals presenting to secondary care with unexplained syncope
underwent adenosine testing as defined by the European Society of
Cardiology. Those with a positive test had a permanent pacemaker
implant and were randomised to pacemaker on or off for 6 months before
crossing over to the alternative mode. Those with a negative adenosine
test underwent a loop recorder implantation. The primary outcome was
cumulative syncope burden as reported by monthly syncope diaries.
Results: Fifty-two patients were included in the trial and had adenosine testing.
There were 35 positive adenosine tests (67%) and 17 negative adenosine
tests (33%). There was a mean of 0.4 fewer syncopal episodes per patient
during the pacemaker on period compared to the pacemaker off period
(1.2 vs. 1.6 episodes) with a higher relative risk of syncope in the
pacemaker off period compared with the pacemaker on (RR 2.1, 95% CI
1.0 to 4.4, p=0.048). In the adenosine negative arm, one patient
developed bradycardia requiring permanent pacing, giving a negative
predictive value of the adenosine test for identifying a bradycardia pacing
indication of 0.94 (95% CI 0.69 to 1.0).
Conclusion: Permanent pacing reduces the syncope burden in patients with
unexplained syncope and a positive adenosine test, whilst a high negative
predictive value demonstrates the low likelihood of a missed opportunity
for pacemaker implantation. Our study suggests that a positive adenosine
test unmasks bradycardia pacing indications without the need for prolonged and invasive investigations, providing opportunity for early
and effective intervention
Environment, alcohol intoxication and overconfidence: evidence from a lab-in-the-field experiment
Alcohol has long been known as the demon drink; an epithet owed to the numerous social ills it is associated with. Our lab-in-the-field experiment assesses the extent to which changes in intoxication and an individual's environment lead to changes in overconfidence or cognitive ability that are, in turn, often linked to problematic behaviours. Results indicate that it is the joint effect of being intoxicated in a bar, rather than simply being intoxicated, that matters. Subjects systematically underestimated the magnitude of their behavioural changes, suggesting that they cannot be held fully accountable for their actions
- …